Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e.g., agriculture, remote sensing, and space technologies. Predominant research efforts tackle these fine-grained sub-tasks following different paradigms, while the inherent relations between these tasks are neglected. Moreover, given most of the research remains fragmented, we conduct an in-depth study of the advanced work from a new perspective of learning the part relationship. In this perspective, we first consolidate recent research and benchmark syntheses with new taxonomies. Based on this consolidation, we revisit the universal challenges in fine-grained part segmentation and recognition tasks and propose new solutions by part relationship learning for these important challenges. Furthermore, we conclude several promising lines of research in fine-grained visual parsing for future research.
translated by 谷歌翻译
Fine-grained visual recognition is to classify objects with visually similar appearances into subcategories, which has made great progress with the development of deep CNNs. However, handling subtle differences between different subcategories still remains a challenge. In this paper, we propose to solve this issue in one unified framework from two aspects, i.e., constructing feature-level interrelationships, and capturing part-level discriminative features. This framework, namely PArt-guided Relational Transformers (PART), is proposed to learn the discriminative part features with an automatic part discovery module, and to explore the intrinsic correlations with a feature transformation module by adapting the Transformer models from the field of natural language processing. The part discovery module efficiently discovers the discriminative regions which are highly-corresponded to the gradient descent procedure. Then the second feature transformation module builds correlations within the global embedding and multiple part embedding, enhancing spatial interactions among semantic pixels. Moreover, our proposed approach does not rely on additional part branches in the inference time and reaches state-of-the-art performance on 3 widely-used fine-grained object recognition benchmarks. Experimental results and explainable visualizations demonstrate the effectiveness of our proposed approach. The code can be found at https://github.com/iCVTEAM/PART.
translated by 谷歌翻译
During the deployment of deep neural networks (DNNs) on edge devices, many research efforts are devoted to the limited hardware resource. However, little attention is paid to the influence of dynamic power management. As edge devices typically only have a budget of energy with batteries (rather than almost unlimited energy support on servers or workstations), their dynamic power management often changes the execution frequency as in the widely-used dynamic voltage and frequency scaling (DVFS) technique. This leads to highly unstable inference speed performance, especially for computation-intensive DNN models, which can harm user experience and waste hardware resources. We firstly identify this problem and then propose All-in-One, a highly representative pruning framework to work with dynamic power management using DVFS. The framework can use only one set of model weights and soft masks (together with other auxiliary parameters of negligible storage) to represent multiple models of various pruning ratios. By re-configuring the model to the corresponding pruning ratio for a specific execution frequency (and voltage), we are able to achieve stable inference speed, i.e., keeping the difference in speed performance under various execution frequencies as small as possible. Our experiments demonstrate that our method not only achieves high accuracy for multiple models of different pruning ratios, but also reduces their variance of inference latency for various frequencies, with minimal memory consumption of only one model and one soft mask.
translated by 谷歌翻译
A unidirectional imager would only permit image formation along one direction, from an input field-of-view (FOV) A to an output FOV B, and in the reverse path, the image formation would be blocked. Here, we report the first demonstration of unidirectional imagers, presenting polarization-insensitive and broadband unidirectional imaging based on successive diffractive layers that are linear and isotropic. These diffractive layers are optimized using deep learning and consist of hundreds of thousands of diffractive phase features, which collectively modulate the incoming fields and project an intensity image of the input onto an output FOV, while blocking the image formation in the reverse direction. After their deep learning-based training, the resulting diffractive layers are fabricated to form a unidirectional imager. As a reciprocal device, the diffractive unidirectional imager has asymmetric mode processing capabilities in the forward and backward directions, where the optical modes from B to A are selectively guided/scattered to miss the output FOV, whereas for the forward direction such modal losses are minimized, yielding an ideal imaging system between the input and output FOVs. Although trained using monochromatic illumination, the diffractive unidirectional imager maintains its functionality over a large spectral band and works under broadband illumination. We experimentally validated this unidirectional imager using terahertz radiation, very well matching our numerical results. Using the same deep learning-based design strategy, we also created a wavelength-selective unidirectional imager, where two unidirectional imaging operations, in reverse directions, are multiplexed through different illumination wavelengths. Diffractive unidirectional imaging using structured materials will have numerous applications in e.g., security, defense, telecommunications and privacy protection.
translated by 谷歌翻译
Graph neural networks (GNNs) are popular weapons for modeling relational data. Existing GNNs are not specified for attribute-incomplete graphs, making missing attribute imputation a burning issue. Until recently, many works notice that GNNs are coupled with spectral concentration, which means the spectrum obtained by GNNs concentrates on a local part in spectral domain, e.g., low-frequency due to oversmoothing issue. As a consequence, GNNs may be seriously flawed for reconstructing graph attributes as graph spectral concentration tends to cause a low imputation precision. In this work, we present a regularized graph autoencoder for graph attribute imputation, named MEGAE, which aims at mitigating spectral concentration problem by maximizing the graph spectral entropy. Notably, we first present the method for estimating graph spectral entropy without the eigen-decomposition of Laplacian matrix and provide the theoretical upper error bound. A maximum entropy regularization then acts in the latent space, which directly increases the graph spectral entropy. Extensive experiments show that MEGAE outperforms all the other state-of-the-art imputation methods on a variety of benchmark datasets.
translated by 谷歌翻译
从大脑的事件驱动和稀疏的尖峰特征中受益,尖峰神经网络(SNN)已成为人工神经网络(ANN)的一种节能替代品。但是,SNNS和ANN之间的性能差距很长一段时间以来一直在延伸SNNS。为了利用SNN的全部潜力,我们研究了SNN中注意机制的影响。我们首先使用插件套件提出了我们的注意力,称为多维关注(MA)。然后,提出了一种新的注意力SNN体系结构,并提出了端到端训练,称为“ ma-snn”,该体系结构分别或同时或同时延伸了沿时间,通道以及空间维度的注意力重量。基于现有的神经科学理论,我们利用注意力重量来优化膜电位,进而以数据依赖性方式调节尖峰响应。 MA以可忽略的其他参数为代价,促进了香草SNN,以实现更稀疏的尖峰活动,更好的性能和能源效率。实验是在基于事件的DVS128手势/步态动作识别和Imagenet-1K图像分类中进行的。在手势/步态上,尖峰计数减少了84.9%/81.6%,任务准确性和能源效率提高了5.9%/4.7%和3.4 $ \ times $/3.2 $ \ times $。在ImagEnet-1K上,我们在单个/4步res-SNN-104上获得了75.92%和77.08%的TOP-1精度,这是SNN的最新结果。据我们所知,这是SNN社区与大规模数据集中的ANN相比,SNN社区取得了可比甚至更好的性能。我们的工作阐明了SNN作为支持SNN的各种应用程序的一般骨干的潜力,在有效性和效率之间取得了巨大平衡。
translated by 谷歌翻译
数据爆炸和模型尺寸的增加推动了大规模机器学习的显着进步,但也使模型训练时间耗时和模型存储变得困难。为了解决具有较高计算效率和设备限制的分布式模型培训设置中的上述问题,仍然存在两个主要困难。一方面,交换信息的沟通成本,例如,不同工人之间的随机梯度是分布式培训效率的关键瓶颈。另一方面,较少的参数模型容易用于存储和通信,但是损坏模型性能的风险。为了同时平衡通信成本,模型容量和模型性能,我们提出了量化的复合镜下降自适应亚基(QCMD Adagrad),并量化正规化双平均平均自适应亚级别(QRDA ADAGRAD)进行分布式培训。具体来说,我们探讨了梯度量化和稀疏模型的组合,以降低分布式培训中每次迭代的通信成本。构建了基于量化梯度的自适应学习率矩阵,以在沟通成本,准确性和模型稀疏性之间达到平衡。此外,从理论上讲,我们发现大量化误差会引起额外的噪声,从而影响模型的收敛性和稀疏性。因此,在QCMD Adagrad和QRDA Adagrad中采用了具有相对较小误差的阈值量化策略,以提高信噪比并保留模型的稀疏性。理论分析和经验结果都证明了所提出的算法的功效和效率。
translated by 谷歌翻译
基于深度学习的超分辨率(SR)近年来由于其高图像质量性能和广泛的应用方案而获得了极大的知名度。但是,先前的方法通常会遭受大量计算和巨大的功耗,这会导致实时推断的困难,尤其是在资源有限的平台(例如移动设备)上。为了减轻这种情况,我们建议使用自适应SR块进行深度搜索和每层宽度搜索,以进行深度搜索和每层宽度搜索。推理速度与SR损失一起直接将其带入具有高图像质量的SR模型,同​​时满足实时推理需求。借用了与编译器优化的速度模型在搜索过程中每次迭代中的移动设备上的速度,以预测具有各种宽度配置的SR块的推理潜伏期,以更快地收敛。通过提出的框架,我们在移动平台的GPU/DSP上实现了实时SR推断,以实现具有竞争性SR性能的720p分辨率(三星Galaxy S21)。
translated by 谷歌翻译
置换矩阵构成了一个重要的计算构建块,这些构建块在各个领域中经常使用,例如通信,信息安全和数据处理。具有相对较大数量的基于功率,快速和紧凑型平台的输入输出互连的置换运算符的光学实现是非常可取的。在这里,我们提出了通过深度学习设计的衍射光学网络,以全面执行置换操作,可以使用被动的传播层在输入和视场之间扩展到数十万个互连,这些互连是在波长规模上单独构造的。 。我们的发现表明,衍射光网络在近似给定置换操作中的容量与系统中衍射层和可训练的传输元件的数量成正比。这种更深的衍射网络设计可以在系统的物理对齐和输出衍射效率方面构成实际挑战。我们通过设计不对对准的衍射设计来解决这些挑战,这些设计可以全面执行任意选择的置换操作,并首次在实验中证明了在频谱的THZ部分运行的衍射排列网络。衍射排列网络可能会在例如安全性,图像加密和数据处理以及电信中找到各种应用程序;尤其是在无线通信中的载波频率接近THZ波段的情况下,提出的衍射置换网络可以潜在地充当无线网络中的通道路由和互连面板。
translated by 谷歌翻译
波前调节器的限制空间散宽产品(SBP)阻碍了大型视野(FOV)上图像的高分辨率合成/投影。我们报告了一种深度学习的衍射显示设计,该设计基于一对训练的电子编码器和衍射光学解码器,用于合成/项目超级分辨图像,使用低分辨率波形调节器。由训练有素的卷积神经网络(CNN)组成的数字编码器迅速预处理了感兴趣的高分辨率图像,因此它们的空间信息被编码为低分辨率(LR)调制模式,该模式通过低SBP Wavefront调制器投影。衍射解码器使用薄的传播层处理该LR编码的信息,这些层是使用深度学习构成的,以在其输出FOV处进行全面合成和项目超级分辨图像。我们的结果表明,这种衍射图像显示可以达到〜4的超分辨率因子,表明SBP增加了约16倍。我们还使用3D打印的衍射解码器在THZ光谱上进行实验验证了这种衍射超分辨率显示器的成功。该衍射图像解码器可以缩放以在可见的波长下运行,并激发紧凑,低功率和计算效率的大型FOV和高分辨率显示器的设计。
translated by 谷歌翻译